References
Empirical Motivation
Empirical Question
How strongly does pollution affect labor market outcomes?
- We know that pollution is bad for health
- But how does it affect economic activity, particularly earnings and employment?
Challenge: Endogeneity
Cannot just regress labor market outcomes on overall pollution
- Two-way causality, more economically active places tend to have more pollution
- Simple regression will suffer from endogeneity
- Can solve endogeneity with instrumental variables
- But those are difficult to find
Another Approach
- Find pollution not driven by (your own) economic activity
- But some places may be more likely to have this pollution \(\Rightarrow\) this would affect decisions of people to live there
What if we could control for this likelihood?
- How — topic of lecture
- Application: how Borgschulte, Molitor, and Zou (2024) solve the issue
Motivation and Questions
Reminder: TWFE
Recall: for difference-in-differences showed that \[ \small \widehat{ATT}^{DiD} = \hat{\delta} \] where \(\hat{\delta}\) was the OLS estimator in regression \[\small Y_{i2}- Y_{i1} = \gamma + \delta D_{it} + U_{i2} \tag{1}\] for \(\delta\) — the ATT; \(\gamma\) — average change in outcomes (trend)
Reminder: More General Equation
Equation 1 obtained by differencing the two-way fixed effect equation: \[ Y_{it} = \alpha_i + \gamma_t + \delta D_{it} + U_{it}, \tag{2}\] where \[ \gamma_1 = 0, \quad \gamma_2 = \gamma \] and \(\alpha_i = Y_{i1}^0\) — baseline differences between units
Reminder: Estimation
- We know how to apply OLS based on Equation 1: just regress \((Y_{i2}-Y_{i1})\) on \((1, D_{it})\) with OLS
- Last time said that can also apply OLS based on Equation 2 directly (e.g.
PanelOLSfromlinearmodels)- Treated \(\alpha_i\) as parameters
- Got exactly the same results from two approaches
Lecture Questions
- How to apply OLS on Equation 2?
- What are the regressors? How to “treat \(\alpha_i\) as parameters”?
- How does it work in practice?
- Is the estimator inspired by Equation 2 always equal to the one based on Equation 1?
- Can we apply the same approach with general (not just 0/1) treatment? What are the causal properties?
Fixed Effect Estimation
Random Intercept (Fixed Effect) Models
First Goal: Vector-Matrix Representation
First question: “treating \(\alpha_i\) as parameters”?
For now forget about \(D_{it}\) and \(\gamma\) and consider: \[ Y_{it} = \alpha_i + U_{it}, \quad i=1,\dots, N; t=1, \dots, T \tag{3}\] \(\alpha_i\) — individual-specific intercept (“unit fixed effect”). Data assumed balanced (same \(T\) for all units)
Want to represent Equation 3 in vector-matrix form
Vector-Matrix Forms for Panel Data I
Before that: more info on matrix forms for panel data.
Vector form as before: single observation (now fixed \(i\) and \(t\)) with vector of covariates: \[ Y_{it} = \bX_{it}'\bbeta + U_{it} \]
Vector-Matrix Forms for Panel Data II
Two key matrix forms:
Individual level. Let \(\bY_i = (Y_{i1}, \dots, Y_{iT})\), \(\bX_i = (\bX_{i1}, \dots, \bX_{iT})'\), then \[\small \bY_i = \bX_i\bbeta + \bU_i \]
Full sample. Let \(\bY = (\bY_1, \dots, \bY_N)\), \(\bX= (\bX_1', \dots, \bX_N')'\). Then \[ \small \bY = \bX\bbeta + \bU \] What are the dimensions of \(\bY_i, \bX_i, \bY, \bX\)?
Individual Matrix Form with Individual Intercepts
Model (3) in individual matrix form: \[ \bY_i = \mathbf{1}_T\alpha_i + \bU_i \] where \(\mathbf{1}_T\) — \(T\)-vector of ones
Not that insightful
Full Sample Matrix Form with Individual Intercepts
Model (3) in full sample matrix form \[ \begin{aligned} \bY & = \bF \bLambda + \bU, \\ \bLambda & = (\alpha_1, \dots, \alpha_N)', \\ \bF & = \bI_N \otimes \mathbf{1}_T, \end{aligned} \tag{4}\] where \(\otimes\) is the Kronecker product. Intuition:
- There are \(N\) regressors, \(i\)th regressor is the dummy of being the \(i\)th unit (0/1 regressor values)
- \(\bLambda\) — associated parameter vector
More Complex Example: Two-Way Intercept Model
Now consider more general model: \[ Y_{it} = \alpha_i + \gamma_t + U_{it} \] Here want to treat both \(\alpha_i\) and \(\gamma_t\) as parameters
Individual matrix form: \[ \begin{aligned} \bY_i & = \mathbf{1}_T \alpha_i + \bI_T \bgamma + \bU_i\\ \bgamma & = (\gamma_1, \dots, \gamma_T)' \end{aligned} \]
Two-Way Model: Full Sample Matrix Form
Can write \[ \begin{aligned} \bY & = \bF\bLambda + \bU, \\ \bF & = \left(\bI_N \otimes \mathbf{1}_T, \mathbf{1}_N\otimes \bI_T \right)\\ \bLambda & = (\alpha_1, \dots, \alpha_N, \gamma_1, \dots, \gamma_T) \end{aligned} \]
- Both \(\alpha_{\cdot}\) and \(\gamma_{\cdot}\) treated as parameters
- Regressors in \(\bF\): \(N\) dummy regressors from before; \(T\) new dummy regressors, \(t\)th new regressor — indicator of \(t\)th period
Adding Other Covariates
Can write Equation 2 as \[ Y_{it} = \alpha_i + \gamma_t + \bX_{it}'\bbeta + U_{it}, \] for \(\bX_{it} = (D_{it})\) and \(\bbeta = (\delta)\)
More generally, consider any vector \(\bX_{it}\) — not just binary treatments
Its matrix form is \[ \bY = \bF\bLambda + \bX\beta + \bU \]
Random-Intercept (Fixed Effects) Models
Definition 1 Models of the kind \[ \small \bY = \bF\bLambda +\bX\bbeta + \bU, \tag{5}\] where \(\bF\) is a matrix of 0s and 1s are called fixed effects or random intercept models
- Fixed effects and random intercepts — often used interchangeably
- Random intercepts — less ambiguous
Examples of Model (5)
- Individual fixed effects/intercepts (one-way) \[ Y_{it} = \alpha_i + \bX_i'\bbeta + U_{it} \]
- Two-way models (time and individual effects): \[ Y_{it} = \alpha_i + \gamma_t + \bX_i'\bbeta + U_{it} \]
- Can include more complicated effects, see empirical illustration (where \(i\) — US counties, \(t\) — quarters; effects — county-season and state-year)
Fixed Effect (Within) Estimators
Estimation Strategies
Suppose \(\E[U_{it}|\bX_i]=0\). How to estimate parameters of Model (5)?
There are two main strategies:
- Estimate including \(\bF\) and \(\bX\) as regressors — least squares dummy variable (LSDV) estimator
- Get rid of \(\bF\), estimate after — within estimator
LSDV Estimation
LSDV — simply regress \(\bY\) on \((\bF, \bX)\): \[ (\hat{\bLambda}, \hat{\bbeta}^{LSDV}) = \argmin_{\bL, \bb} \norm{\bY - \bF\bL -\bX\bb }_2^2 \]
For example with two-way effects:
\[\small \begin{aligned} & \left(\hat{\alpha}_1, \dots, \hat{\alpha}_N, \hat{\gamma}_1, \dots, \hat{\gamma}_T, \hat{\bbeta}^{LSDV} \right)\\ & = \argmin_{a_1, \dots, a_N, g_1, \dots, g_T, \bb}\sum_{i=1}^N \sum_{t=1}^T \left(Y_{it} - a_i - g_t - \bX_{it}'\bb \right)^{2} \end{aligned} \]
Within Estimation I: One-Way Transformation
First consider one-way model \(Y_{it} = \alpha_{i} + \bX_{it}' + U_{it}\). For \(\bW_{it} = Y_{it}, \bX_{it}, U_{it}\), define the (one-way) within-transformed version of \(W_{it}\) as \[ \small \tilde{W}_{it} = W_{it} - \dfrac{1}{T}\sum_{s=1}^T W_{is} \tag{6}\]
Within transformation eliminated fixed effects (=\(\bF\)) \[ \small \tilde{Y}_{it} = \tilde{\bX}_{it}'\bbeta + \tilde{U}_{it} \]
Within Estimation II: Two-Way Transformations
Suppose \(Y_{it} = \alpha_i + \gamma_t + \bX_{it}'\bbeta + U_{it}\). Define (two-way) within-transformed variables as \[ \small \tilde{W}_{it} = W_{it} - \dfrac{1}{T}\sum_{s=1}^T W_{is} - \dfrac{1}{N} \sum_{j=1}^N W_{jt} + \dfrac{1}{NT} \sum_{j=1}^N \sum_{s=1}^T W_{js} \tag{7}\]
Again eliminated fixed effects (=\(\bF\)) \[ \small \tilde{Y}_{it} = \tilde{\bX}_{it}'\bbeta + \tilde{U}_{it} \]
Can find the matrix formula in section 3.2 of Baltagi (2021)
More General Within Transformation
Consider general model: \[ \small \bY = \bF\bLambda + \bX\bbeta + \bU \]
There exists a linear transformation that eliminates \(\bF\): \[ \small \tilde{\bY} = \tilde{\bX}\bbeta + \tilde{\bU} \]
Called the FWL or the (generalized) within transformation
Transformations — just application of the Frisch-Waugh-Lowell (“anatomy of regression”) theorem. See E1a in Wooldridge (2020)
Within Estimation
Within estimation: just regressing \(\tilde{\bY}\) on \(\tilde{\bX}\) with OLS: \[ \hat{\bbeta}^{W} = \argmin_{\bb} \sum_{i=1}^N \sum_{t=1}^T (\tilde{Y}_{it} - \tilde{\bX}_{it}'\bb)^2 \]
- \(\tilde{\bX}\) must have maximum column rank (no collinearity in transformed regressors)
- Intuition in one-way case (only \(\alpha_i\)): some variation in \(\bX_{it}\) over time
Equivalence of Approaches
Proposition 1 \[ \hat{\bbeta}^{LSDV} = \hat{\bbeta}^{W} \]
- Both approaches: same estimated values
- Explains why we got same results from two regression approaches to DiD last time
- Allows to use single name for both estimators. Usually called fixed effects or random intercept estimators
Not examinable: proof is a consequence of Frisch-Waugh-Lowell theorem (E1a in Wooldridge 2020)
Which Approach to Use?
When to use LSDV vs. within estimation?
- LSDV: only when you care about \(\Lambda\) and they have some economic meaning. Example: \(\alpha_i\) is the innate skill of worker \(i\) (e.g. de la Roca and Puga 2017)
- Within: in all other cases
Sometimes impossible to compute LSDV estimator: number of fixed effects is too large to even simply store the data matrix:
- Called the “high-dimensional” fixed effect case
- In practice the within transformation is cleverly done indirectly (Correia 2016)
Pooled OLS
Another special case of model (5) — pooled OLS:
- No fixed effects, directly regressing \(Y_{it}\) on \(\bX_{it}\)
- Fully ignoring the panel structure
- Involves no transformations and no \(\bF\)
- Interpretation: the least flexible example of (5)
Implementation in Python
linearmodelssupports both LSDV and within transformations- Defaults to eliminating effects
- LSDV can be used with
PanelOLS.fit(use_lsdv=True)
pyfixestwas designed for high-dimensional FE estimation (can handle small examples too)- LSDV not available (to the best of my knowledge)
- See empirical application for example usage
Causal Properties with General Treatment
Question: Causal Properties of FE Estimators
So far:
- Now have described the estimation approach underlying DiD regression estimation
- Noticed that it can handle general treatments \(\bX_{it}\), not just scalar binary \(D_{it}\)
What are the causal properties of such estimators? Under which models do they give meaningful results?
Reflection on our Approach
Note:
- Approach this problem differently from past lectures:
- Here first formulate an estimator and only after start understanding causal properties
- Usually goes the other way: fix causal problem, try to figure out identification and estimation
- Doing so for historic reasons — these estimators came first, serious causal thinking more recently
Models Considered
Will consider two kinds of models under strict exogeneity
- Random intercept causal process with same \(\bbeta\) for everyone — reflects estimator structure
- Model with heterogeneous effects \(\bbeta_i\)
Properties under Random Intercept Model
Causal Framework with Random Intercepts
- Some vector of treatments \(\bX_{it}\)
- Potential outcome of unit \(i\) in time \(t\) given by \[ Y^{\bx}_{it} = \alpha_i + \gamma_t + \bx'\bbeta + U_{it} \tag{8}\]
- Will think about about exogeneity conditions later
- Object of interest — \(\bbeta\) (plays the role of ATE, ATT, …)
For definiteness, we do two-way effects, but can apply same analysis for any configuration of random intercepts, just need to define \(\tilde{Y}_{it}\) appropriately
Sampling Setting
Work in the following setting
- Units drawn IID
- \(N\) large, \(T\) fixed
- More typical kind of panel data (“micro” panel)
- Contrast with “large” panel data with both \(N\) and \(T\) large
Causal Framework: Discussion of Model (8)
- Contrast with less flexible causal model discussed before: \[ Y_{it}^{\bx} = \bx'\bbeta + U_{it} \]
- Interpretations:
- \(\alpha_i\) — often some characteristic of \(i\) that does not change over \(t\) in sample (e.g. innate intellect)
- \(\gamma_t\) — shocks that affect everyone equally
- Similar logic for other types of random interecepts
Estimator for \(\bbeta\)
- Estimate \(\bbeta\) with the FE estimator, expressed as (Proposition 1) \[ \small \begin{aligned} \hat{\bbeta}^{FE} & = \left(\sum_{i=1}^N \sum_{t=1}^T \tilde{\bX}_{it}\tilde{\bX}_{it}' \right)^{-1}\sum_{i=1}^N \sum_{t=1}^T \tilde{\bX}_{it}\tilde{Y}_{it} \\ & = \left( \sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bY}_i \end{aligned} \] with variables transformed as in Equation 7
- Will discuss nonsingularity assumptions a bit later
Probability Limit of \(\hat{\bbeta}^{FE}\)
Realized data satisfies \(\tilde{\bY} = \tilde{\bX}_{it}'\bbeta + \tilde{U}_{it}\)
So as \(N\to\infty\) and \(T\) is fixed: \[ \small \hat{\bbeta}^{FE} \xrightarrow{p} \bbeta + \left( \E[\tilde{\bX}_{i}'\tilde{\bX}_i]\right)^{-1} \E[\tilde{\bX}_{i}'\tilde{\bU}_{i}] \]
\(T\) fixed — basically treat each unit a single \(T\)-dimensional observation (as in \(\tilde{\bU}_i\))
Rank (Nonsingularity) Conditions
So far needed to impose nonsingularity conditions
- Sample: on \(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i\)
- Population: on \(\E[\tilde{\bX}_{i}'\tilde{\bX}_i]\)
What does it require of \(\bX_{it}\)?
- No collinearity
- Also variation after the within transformation (one-way: variation over time; two-way: different variation over time for different units; etc.)
Towards Exogeneity Conditions
For consistency want \[ \small \E[\tilde{\bX}_i'\tilde{\bU}_i] = \sum_{t=1}^T \E\left[\tilde{\bX}_{it} \tilde{U}_{it} \right] =0 \]
Sufficient that for all \(t\) \[ \small \E\left[\tilde{\bX}_{it} \tilde{U}_{it} \right] =0 \tag{9}\]
What does this condition require of \(\bX_{it}\) and \(U_{it}\)?
Exogeneity in the One-Way Case
Under one-way transformation (6), Equation 9 becomes \[ \scriptsize \E\left[\tilde{\bX}_{it} \tilde{U}_{it} \right] = \E\left[ \bX_{it}U_{it} - \dfrac{\bX_{it}}{T}\sum_{s=1}^T U_{is} - \dfrac{U_{it}}{T}\sum_{r=1}^T \bX_{ir} + \dfrac{1}{T^2} \sum_{s=1}^T\sum_{r=1}^T \bX_{is} U_{ir}\right] = 0 \]
Here would be sufficient that for all \(t\), \(s\) \[ \small \E[\bX_{it}U_{is}] = 0 \tag{10}\] Intuition: \(\bX_{it}\) and \(U_{is}\) are uncorrelated across all points in time
Problematic direction is usually \(s<t\): predicting future \(\bX_{it}\) from past \(U_{is}\) (your shocks influence your future decisions)
Strict Exogeneity in the Panel Case
What about beyond one-way effects? Usually impose an assumption that covers all cases — panel data version of strict exogeneity :
Assumption (strict exogeneity): \[ \E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0 \]
Much stronger than just \(\E[U_{it}|\bX_{it}]=0\)
Strict Exogeneity Implies Equation 9
Proposition 2 Let \(\E[U_{is} | \bX_{i1}, \dots, \bX_{iT}] =0\) for all \(s\). Then for any within transformation it holds for all \(t\) that \[ \E[\tilde{\bX}_{i}'\tilde{\bU}_i] =0 \]
- Proof by properties of conditional expectations
- Covers all configurations of random intercepts
See first block for key properties of conditional expectation
Consistency Result
Proposition 3 Let
- \((\bX_i, \bU_i)\) be IID and model (8) be true
- \(\E[\tilde{\bX}_i'\tilde{\bX}_i]\) exist and be invertible
- \(\E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0\)
Then as \(N\to\infty\)
\[ \hat{\bbeta}^{FE} \xrightarrow{p} \bbeta \]
Asymptotic Distribution
Proposition 4 Let
- \((\bX_i, \bU_i)\) be IID and model (8) be true
- \(\E[\tilde{\bX}_i'\tilde{\bX}_i]\) exist and be invertible; \(\E\left[\norm{\tilde{\bX}_i'\tilde{\bU}_i}^2\right]<\infty\)
- \(\E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0\)
Then as \(N\to\infty\)
\[ \scriptsize \sqrt{N}\left(\hat{\bbeta}^{FE} - \bbeta\right) \xrightarrow{d} N\left(0, \left(\E[\tilde{\bX}_i'\tilde{\bX}_i] \right)^{-1} \E[\tilde{\bX}_i'\tilde{\bU}_i\tilde{\bU}_i'\tilde{\bX}_i] \left(\E[\tilde{\bX}_i'\tilde{\bX}_i] \right)^{-1}\right) \]
Discussion of Asymptotic Results for \(\hat{\bbeta}^{FE}\)
- Can do inference and estimate errors in more or less standard way (confidence intervals, hypothesis tests, …)
- Proof of asymptotic normality — exercise
- Not examinable technical point: some within transformations (e.g. two-way) can create dependence across \(i\). This dependence disappears as \(N\to\infty\) and does not affect asymptotics
Properties under Heterogeneous Coefficient Model
Causal Framework with Heterogeneous Coefficients
Consider different potential outcomes setting: \[ \small Y_{it}^{\bx} = \bx'\bbeta_{\textcolor{teal}{i}} + U_{it} \tag{11}\] under strict exogeneity \(\E[U_{it} | \bX_{i1}, \dots, \bX_{iT}] =0\)
- Special case: unit-specific intercepts
- Does not nest two-way or other random intercepts with time variation
- (11) interesting because it allows heterogeneous effects of same change in \(\bX_{it}\)
FE Estimator
Can still use the FE estimator: \[ \hat{\bbeta}^{FE} = \left(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)^{-1}\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bY}_i \] Can do any transformation such that \(\E[\tilde{\bX}_i'\tilde{\bX}_i]\) is invertible
What does \(\hat{\bbeta}^{FE}\) do under model (11)?
Expanding Estimator and Taking Limits
Substituting model (11) gets us \[ \small \begin{aligned} \hat{\bbeta}^{FE} & = \left(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)^{-1}\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i\bbeta_i + \left(\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bX}_i \right)^{-1}\sum_{i=1}^N \tilde{\bX}_i'\tilde{\bU}_i\\ & \xrightarrow{p} \E\left[ \bW(\tilde{\bX}_i) \bbeta_i\right] \end{aligned} \] for \(\small\bW(\tilde{\bX}_i) = \left(\E\left[\tilde{\bX}_i'\tilde{\bX}_i\right] \right)^{-1} \tilde{\bX}_i'\tilde{\bX}_i\)
Discussion of \(\hat{\bbeta}^{FE}\) under Heterogeneous Effects
- Result: FE estimator estimate weighted average of \(\bbeta_i\)
- Weights are positive definite and sum to one
- Weights depend on within transformation used
- Weights are positive definite and sum to one
- In general \(\E\left[ \bW(\tilde{\bX}_i) \bbeta_i\right]\neq \E[\bbeta_i]\) (except RCTs)
- Careful: \(\E\left[ \bW(\tilde{\bX}_i) \bbeta_i\right]\) and \(\E[\bbeta_i]\) can even have different signs sometimes
Extension: Factor Models
Can generalize random intercept models to the form \[ Y_{it}^{\bx} = \balpha_i'\bgamma_t + \bx'\bbeta_i + U_{it} \]
- \(\bgamma_t\) — unobserved “factors”, \(\balpha_i\) — factor loadings
- Allows people to react differently to same shocks in \(\bgamma_t\)
- See chapter 29 in Pesaran (2015)
Empirical Application
Context and Setting
Empirical Question
Recall empirical question:How does pollution affect labor market outcomes?
- Want to answer question without instruments
- Need some “economic activity-exogenous” measure of pollution
- Also need to control “predisposition” to such pollution
Data
Use data from Borgschulte, Molitor, and Zou (2024)
- Quarterly data for all 3142 counties in the US over 2007-2019 (\(T\approx 48\))
- Data on average earnings and employment in county
Loading and preparing data
# Load data
columns_to_load = [
"countyfip", # FIPS
"rfrnc_yr", # Year
"rfrnc_qtroy", # Quarter
"d_pc_qwi_payroll", # Earnings (annual diff)
"hms_deep", # Number of smoke days
"fe_countyqtroy",
"fe_styr",
"fe_stqtros",
"seer_pop", # Population
]
county_df = pd.read_csv(
"data/county_quarter.tab",
sep="\t",
usecols=columns_to_load,
)
# Rename columns
column_rename_dict = {
"countyfip":"fips",
"rfrnc_yr":"year",
"rfrnc_qtroy":"quarter",
"d_pc_qwi_payroll":"diff_payroll",
"hms_deep":"smoke_days",
"fe_countyqtroy":"fe_id_county_quarter",
"fe_styr":"fe_id_state_year",
"fe_stqtros":"fe_id_state_quarter",
"seer_pop":"population",
}
county_df = county_df.rename(columns=column_rename_dict)
# View data
county_df.dropna(inplace=True)
county_df.head(2)| fips | year | quarter | smoke_days | population | fe_id_state_year | fe_id_county_quarter | fe_id_state_quarter | diff_payroll | |
|---|---|---|---|---|---|---|---|---|---|
| 4 | 1001 | 2007 | 1 | 0.0 | 52405.0 | 2 | 1 | 5 | 25.800293 |
| 5 | 1001 | 2007 | 2 | 9.0 | 52405.0 | 2 | 2 | 6 | 33.243042 |
Pollution Measure
Pollution — number of smoke days because of wildfires
- Wildfire smoke can travel far — “exogenous”
- Some places at great risk of fires or persistent smoke — how to handle impact?
Can download the data from the Harvard dataverse
Distribution of Pollution Measure
<<<<<<< HEADSpecification and Estimation
Key Variables and Homogeneity Assumption
- Outcome: change in average earnings (employment — exercise)
- Treatment: number of smoke days in quarter
- Analysis level: \(i\) — counties, \(t\) — quarters (no aggregation)
- Assumption: treatment has same effect in all \((i, t)\) and at all levels. Assumed potential outcomes model \[\small (\Delta \text{Earnings}_{it})^{\text{Smoke days}} = \beta\text{Smoke days} + \text{FEs} + U_{it} \]
Random Intercepts
Need to choose random intercepts/FEs so that strict exogeneity holds
Specification of Borgschulte, Molitor, and Zou (2024): include
- County-season intercepts (capture effect of geography, differing per season)
- State-year (capture overall economic trends)
About 13000 different random intercepts (a bit more complicated than \(\alpha_i\) and \(\delta_t\))
Estimation
- Will use
pyfixestthis time to estimate feolsfor fixed effect estimation- Regression formula in
fml, random intercepts after|
Estimation Results
- An additional day reduces quarterly earning about $5.20 on average — significant effect
- Clustered standard errors (p. 77 in Cunningham 2021)
| diff_payroll | |
|---|---|
| (1) | |
| coef | |
| smoke_days | -5.217*** (0.774) |
| fe | |
| fe_id_state_year | x |
| fe_id_county_quarter | x |
| stats | |
| Observations | 160346 |
| S.E. type | by: fips+fe_id_state_quarter |
| R2 | - |
| Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001. Format of coefficient cell: Coefficient (Std. Error) | |
Recap and Conclusions
Recap
In this lecture we
- Discussed fixed effect estimators for DiD and beyond
- LSDV
- Within transformation
- Proved causal properties of FE estimators under
- A random intercept model
- Model with unit-specific coefficients
Next Questions
- What if you want to estimate \(\E[\bbeta_i]\)?
- When does strict exogeneity fail? Can you relax it?
- What does panel data let you do in nonlinear settings?
References
Panel Data: Fixed Effect Estimation